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Abstract: Convolutive source separation is often done in two stages: 1) estimation of the 
mixing filters and 2) estimation of the sources. Traditional approaches suffer from the ambiguities of 
arbitrary permutations and scaling in each frequency bin of the estimated filters and/or the sources, 
and they are usually corrected by taking into account some special properties of the filters/sources. 
This paper focusses on the filter permutation problem in the absence of scaling, investigating the 
possible use of the temporal sparsity of the filters as a property enabling permutation correction. 
Theoretical and experimental results highlight the potential as well as the limits of sparsity as an 
hypothesis to obtain a well-posed permutation problem. 

Key-words: sparse filter, convolutive blind source separation, permutation ambiguity, l v mini- 
mization, Hall's Marriage Theorem, bi-stochastic matrix 
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Caractere bien pose du probleme de 
permutation pour l'estimation des filtres 
parcimonieux par minimisation £ p 

Resume : La separation de source des melanges convolutifs se fait sou- 
vent en deux etapes : 1) estimation des nitres de melange et 2) estimation des 
sources. Les approches classiqucs souffrcnt d'ambigui'tcs de permutation ct de 
facteur d'echcllc arbitraire pour chaque frequence des filtres et/ou des sources 
cstimcs. Ccs ambigui'tes sont habituellement corrigees en prenant en compte 
des proprietes particulieres des nitres/sources. Cet article se concentre sur le 
probleme de permutation des filtres en l'absence de facteur d'echcllc, cn explo- 
rant l'utilisation potcntielle de la parcimonie temporelle des filtres pour resoudrc 
le probleme de permutation. Les resultats theoriques et experimentaux soulig- 
nent tant le potentiel que les limites de l'hypothese de parcimonie pour obtenir 
un probleme bien pose. 

Mots-cles : Filtres parcimonieux, separation aveugle de sources, melange 
convolutif, ambigu'ite de permutation, theoreme de mariage de Hall, matrice 
bi-stochastiquc 
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1 Introduction 

Blind source separation and blind source localization are ubiquitous problems in 
signal processing, with applications ranging from wireless telecommunications 
to underwater acoustics and sound enhancement. 

These problems can be considered as reasonably well understood and solved 
in simple linear instantaneous settings, where tools such as Independent Compo- 
nent Analysis, as well as techniques exploiting source sparsity, are now mature. 
However, the convolutive source localization / separation problem remains much 
more challenging. In particular, without further assumption than statistical in- 
dependence between sources, the problem is known to be ill-posed because of 
the so-called frequency permutation (and scaling) problem: at best, one can 
hope to estimate for each frequency (up to a source and frequency dependent 
scaling factor) the collection of frequency components of all sources (and of the 
associated mixing filters); but one cannot match the estimated frequency com- 
ponents from different subbands to globally identify the sources (and mixing 
filters). 

Several practical approaches have been proposed to solve the permutation 
and scaling problems in practice, by exploiting various properties of either the 
mixing filters or the sources to match different frequency subbands. While some 
of these methods may succeed in practice for certain types of sources / filters, 
there is no known theory guaranteeing the well-posedness of the permutation 
and scaling problem under appropriate assumptions. 

This paper contributes to fill this gap, by providing well-posedness guar- 
antees for the permutation problem under sparsity assumptions on the mixing 
filters. Sparse filters, associated to impulse responses corresponding to a limited 
set of echoes, are typically encountered in a number of underwater communi- 
cation channels [T] or wireless telecommunications scenarios [3] which are 
relevant for blind source localization and separation, and the theoretical results 
achieved in this paper indicate that this property can potentially be exploited 
for blind estimation in this context. 

1.1 Problem formulation and notations 

Let Xi[t], 1 < i < M be M mixtures of N source signals Sj [t] , resulting from 
the convolution with filters dij[t] of length L such that: 

N 

Xi[t] = * s i)M> 1 < * < M, (1) 

i=i 

where * denotes convolution. The filter Oy [t] typically models the impulse 
response between the j th source and the i th sensor. By abuse of notation, 
Fa.ij = {dij[u>]}o<u<L denotes the discrete Fourier transform of the filter seen 
as a vector ay = {a^ [i]}o<t<L € C L . Also, the mixing equation (JTJ can be 
rewritten as X = A * S, with A the matrix of filters 

A := ({ay[t]} <t<i) 1 < i < Mi i<j< N , 

X the observation matrix and S the source matrix. 

In this context, blind filter estimation refers to the problem of obtaining 
estimates of the filters A from the mixtures X, without any explicit knowledge 



RR n° 7782 



4 Alexis Benichoux, Prasad Sudhakar, Frederic Bimbot, and Remi Gribonval 



about the sources S. Mixing filters estimation is relevant for several purposes 
such as deconvolution, source localization, etc. [J. It also has a relationship with 
the problem of Multiplc-Input-Multiplc-Output (MIMO) system identification 
in communications engineering [5]. 

1.2 Frequency domain filter estimation 

Estimating the mixing parameters is made easier when all filters are instan- 
taneous, that is to say of length L = 1, as the convolution product in ([T} is 
replaced by the usual product. However, things get complicated in the general 
setting of convolutivc mixtures. 

A common approach for filter estimation then relies on the transformation 
of the mixing model in Eq. ((TJ) into the time-frequency domain, converting a 
single convolutive filter estimation problem into several complex instantaneous 
filter estimation problems. Using standard techniques for instantaneous mixing 
parameter estimation j6|, complex mixing filter coefficients 

A[ui] — {aij[u)]}i<i<M, i<j<N 
are estimated for each frequency bin < ui < L. 

1.3 Permutation and scaling ambiguities 

Without further assumption on either the filters Oi 3 [t] or the sources Sj[t], one 
can at best hope to find an estimation A = (ffly) where for every frequency uj 
we have 

OyH = A ]H«»«T„OlM' ( 2 ) 

with Xj[uj] a scaling ambiguity and cr w £ &n a permutation ambiguity, where 
©jv is the set of permutations of the integers between one and N. Several 
methods [7] attempt to solve for these ambiguities by exploiting properties of 
either the sources S or the filters A [5J [5J HH] • 

1.4 Exploiting sparsity to solve the permutation ambigu- 
ity 

The focus of this article is the use of the sparsity of A in the time domain to 
find ctq . . . <7z,-i £ 6m, assuming the scaling is solved, i.e., Xj[oj] — 1. 

Assuming that A is sparse means that each filter ay has few nonzero coef- 
ficients, as measured by the £° pseudo-norm 

IKHo := »{0 < t < L, aij [t] ^0}=]T |ay[f]|°. 

t 

The approach considered in this article is to to seek permutations <7o, . . .<tl-i 
yielding the sparsest estimated time-domain matrix of filters A = (ay ) where 
ay[w] := a,jff^ (j) [u>] . Besides the £° pseudo-norm ||A||o := ll^yllo, the fol- 
lowing £ p quasi- norms will be used to quantify the sparsity of A: 

l|A||^:=^||ay||^ = ^|%Mr 0< P <1. 

ij ijt 
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1.5 Main results 

Our main result (Theorem [5]) is a theoretical guarantee that when the filter 
length L is prime, fc-sparse filters (i.e., such that ||ajj||o < k) uniquely minimize 
the £° norm of A (up to a global permutation) if k- < a(N), where N is the 
number of sources. To reach this bound we exploit uncertainty principles as 
well as the bistochastic structure of the problem through an apparently new 
quantitative result on bistochastic matrices (Lemma [2| . 

1.6 Structure of the paper 

The main theorems are stated in Section ^ and the main ingredients of their 
proofs are described in Section [3l In Section [4] we discuss the strength of the 
assumptions used in the theorems, and how much these could be relaxed. In 
Sec. [5j a naive combinatorial £ p minimization algorithm is proposed to resolve 
filter permutations and used for Monte-Carlo simulations. We conclude with a 
discussion of the potential, as well as the limits, of sparsity as a hypothesis to 
solve permutation problems, in connection with the theoretical and empirical 
results. All proofs are gathered in the appendix. 

2 Theoretical guarantees 

Given an M x N filter matrix A, made of filters of length L, and an L-tuple 
(o"o, . . . <7£_i) £ &n of permutations, we let A be the matrix obtained from 
A by applying the permutations in the frequency domain, as in @, without 
scaling = 1). 

The effect of the permutations is said to coincide with that of a global per- 
mutation 7r £ (3jv of the columns of A if 5y = dm(j), Vi,j, or equivalently in 
the frequency domain: 

Oij[u)\ := a lauj{j) [uj] = a i7r (j)[w], < oj < L, Vi,j. 

This is denoted A = A. First, we show that for filters with disjoint time-domain 
supports, permutations cannot decrease the £ p norm, < p < 1: 

Theorem 1 Let C {0, . . . , L — 1} be the time-domain support of aij . Sup- 
pose that for all i and j\ ^ j2 we have 

?i,h n r. iJ2 = 0. (3) 

Then, for < p < 1, we have || A|| p > || A|| p . 

Note that filters with disjoint supports need not be very sparse: M filters 
of length L can have disjoint supports provided that max-,- || a^- 1| o < L/M. Yet, 
disjointness of filter supports is a strong assumption, and Theorem 1 only in- 
dicates that frequency permutations cannot decrease the £ p norm. Thus, the 
minimum value of the £ p norm might not be uniquely achieved (up to a global 
permutation). In our main result, we consider fc-sparse filters of prime length, 
and p = 0: 
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Theorem 2 Let A be an M x TV matrix of filters of prime length L. Assume 
that 

max || Ojj || o < k, (4) 

ij 

k <^N\- S NW+2) *f N is even, 
Z- Q(A °- = \ ifNiaodd. (5) 

Then, up to a global permutation, A uniquely minimises the £° pseudo-norm 
among all possible frequency permutations. 



3 Main elements of the proof of Theorem [2] 

The proof of Theorem [2] relics on a measure of the "amount" of incurred per- 
mutation, on uncertainty principles, and on combinatorial arguments related to 
bi-stochastic matrices, involving Hall's Marriage Theorem. 

3.1 Measures of the amount of incurred permutations 

To measure the "amount" of incurred permutation, one can count the number 
of frequency bands where a non-trivial permutation is incurred, with respect 
to the best matching reference global permutation ir, i.e., min w jj {u>, o~ u 7^ tt}. 
However, this generally yields the maximum count L — 1. 

An alternative is to count the "size" of the incurred permutations, given a 
reference global permutation tt, as the maximum number of frequencies where 
each estimated filter actually differs from the (globally permuted) original filters, 
yielding: 

A(A,A|tt) := max||F(ay- - a i7r(j -))|| (6) 
A(A, A) := min A(A, A|tt). (7) 

Note that A(A, A) = iff A = A. 

3.2 Exploitation of an uncertainty principle 

With this notation, we have the following Lemma: 

Lemma 1 Assume that A ^ A, that L is a prime integer, and that (j4|) holds 
with 

2k + A<L. (8) 

Then ||A||o > ||A||o and ||5.jj||o > \\aij ||o,Vi, j. The latter inequality is strict 
when dij 7^ . For a general L ( not necessarily prime ), the same conclusions 
hold when the assumption (|5|) is replaced with 

2k-A<L. (9) 

The skilled reader will rightly sense the role of uncertainty principles [TTJ fTJl 
IT51 Theorem 1] in the above lemma. 
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3.3 Combinatorial arguments 

Using Lemma [T] with prime L, a simple combinatorial argument can be used to 
obtain a weakened version of Theorem [21 with the more conservative constant 
a'(N) := 1/27V!: by the pigeonhole principle, for any L-tuple of frequency 
permutations among N sources, at least L/N\ permutations are identical; as 
a result, A(A, A) is universally bounded from above by L — L/Nl; hence if 
k < L/2N\ we obtain 2k + A < L and we can conclude thanks to Lemma [T] 

The proof of Theorem [5] with the constant a(N) exploits a stronger uni- 
versal upper bound A(A, A) < L(l — 2a(N)), obtained through an apparently 
new quantitative application of Hall's Marriage Theorem [TJ] to bi-stochastic 
matrices. 

Definition 1 (Bi-stochastic matrix) An NxN matrix B is called bi-stochastic 
if all its entries are non-negative, and the sum of the entries over each row as 
well as the sum of the entries over each column is one. 

Lemma 2 Let B be an N x N bi-stochastic matrix: there exists a permutation 
matrix P such that all the entries of B on the support of P exceed the threshold 



Corollary 1 Let o~q, . . . , ctl—i € &n be L permutations. There exists a global 
permutation tt such that 



4 Discussion 

The reader may have noticed that Theorem^ while dropping the disjoint sup- 
port assumption from Theorem [1] introduces new restrictions: the assumption 
that L is prime, and the restriction to p = compared to < p < 1 in The- 
orem [TJ How important are these restrictions ? Could they be relaxed while 
exploiting sparsity together with the disjoint support assumption ? This is dis- 
cussed in this section. 

4.1 Extending Theorem [2] to non-prime filter length LI 

As indicated by Lemma [3] below, for even L > 4, there exists sparse matrices 
of filters that are the sparsest but not unique (even up to a global permutation) 
solution of the considered problem: certain frequency permutations provide an 
equally sparse but not equivalent solution. 

Lemma 3 For any integer k such that 2k divides L, there exists a matrix of 
k-sparse filters A and a set of L/2k frequency permutations resulting in A ^ A 
such that for all < p < oo: \\ Aj| p = || A|| p , and 




(10) 



c 



U) = : a e (j) = it(j)} > 2La(N), VI < j < N. 



>ij lip — 



%J ||p: 



V«, j- 



(11) 



We have 2k ■ A(A, A) = L. 
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The fact that the filter matrices A and A satisfy 2k- A(A, A) = L shows the 
sharpness of Lemma [TJ for the case when L is even: the strict inequality in 
cannot be improved. 

Specializing Lemma [3] to k = 1 for even L > 4 yields ideally 1-sparse filters 
aij and a set of L/2 frequency permutations such that: Sjj are 1-sparse; A is 
not equivalent to A and cannot be discriminated from it by any £ p norm. 

4.2 Stronger guarantees with disjoint supports and spar- 
sity ? 

Could one get improved results by combining the disjoint support assumptions 
from Theorem[TJand the sparsity assumption from Theorem[2]? For even L > 4, 
Lemma U below indicates the existence of sparse matrices of filters with disjoint 
supports that are the sparsest but not unique (even up to a global permutation) 
solution of the considered problem: certain frequency permutations of "size" 
A = L/2k provide an equally good but not equivalent solution. 

Lemma 4 For any integers k' < k < L/2 such that 2k' divides L, there exist 
a matrix of k-sparse filters A with disjoint supports and a set of L/2k' 
frequency permutations resulting in A ^ A, such that for all < p < oo: 
\\A\\ p = \\A\\ p and 

IIMp = IkylU V M- ( 12 ) 

We have 2k' • A(A, A) = L. 

Specializing Lemma 2] to k' = 1 and k — 2 for even L > 4 yields 2-sparse 
filters and a set of L/2 frequency permutations such that: dij are 2-sparse; 
A is not equivalent to A and cannot be discriminated from it by any l v norm. 

This shows that even by adding the disjoint support assumption, for even 
L > 4, there is little margin to improve Lemma [TJ at best, one can hope to 
replace the strict inequality in (|9|) with a large one. Can this actually be done 
? This is partially answered by the following results: 

Lemma 5 Assume that A ^ A, that (01 holds with 

2fc-A(A,A)=L (13) 

and that the filters in A have disjoint supports ([3]). Then, either ||A||o > ||A||o, 
or each row of A is obtained by permuting pairs of distinct filters o^, ay from 
the corresponding row of A such that aij — ay is proportional to a modulated 
and translated Dirac comb with 2k spikes. 

For filter matrices with a single row, since A ^ A means that the filters a\j 
are permuted versions of a\j , we obtain 

Corollary 2 Consider A with a single row (M = 1). Assume that A ^ A, 
that ([J]) holds with 

2fc-A(A,A)=L (14) 
and that the filters in A have disjoint supports (J3j. Then ||A||o > ||A||o. 
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4.3 Excessive pessimism? 

The counter-examples built in Lemmata [3]|4l which are associated to Dirac 
combs, are highly structured. They provide worst case well-posedness bounds, 
but existing probabilistic versions of uncertainty principles (see, e.g., the nice 
survey |15j ) lead us to conjecture that if the sparse filters in A are drawn at 
random (e.g. from Bernoulli-Gaussian distribution), the uniqueness guarantee 
of Theorem [3] will hold except with small probability 0(L~P), provided that 
k < c((3)L/ log L, for large L. This is left to further theoretical investigation. 

5 Numerical experiments 

The results achieved so far are theoretical well-posedness guarantee, but do not 
quite provide algorithms to compute the potentially unique (up to global permu- 
tation) solution of the frequency permutation problem. We conclude this paper 
with the description of a relatively naive optimization algorithm, an empirical 
assessment of its performance with Monte-Carlo simulations, and a discussion of 
how this compares with the theoretical uniqueness guarantees achieved above. 

5.1 Proposed combinatorial algorithm 

Given a "permuted" matrix A, one wishes to find a set of frequency permuta- 
tions yielding a new matrix A with minimum £ p norm. 

The ^proposed algorithm starts from Ao = A. Given A n , a candidate 
matrix A n+ i !7r can be obtained by applying a permutation it at frequency 
uj n = n [mod L\. Testing each possible permutation tt and retaining the one 
7r„ which minimises |j A Il+ i !7r |jj, yields the next iterate A„ + i := Ajj+i ^. The 
procedure is repeated until the £ p norm A n ceases to change. Since there is a 
finite number of permutations to try, the stopping criterion is met after suffi- 
ciently many iterations. 

5.2 Choice of the £ p criterion 

In theory, it could happen that the stopping criterion is only met after a combi- 
natorially large number of iterations. However, the algorithm stops much sooner 
in practice. In fact, if we were to use the £° norm, the algorithm would typically 
stop after just one iteration, because the t norm attains its maximum value 
M x N x L for most frequency permutations except a few very special ones. For 
this reason, we chose to test the algorithm using £ p norms p > 0, which are not 
as "locally constant" as the £° norm. To our surprise, the experiments below 
will show that the best performance is not achieved for small p, but rather for 
p = 2 — e with small e > 0. For p = and for p > 2, the algorithm indeed 
completely fails. 

5.3 Monte-Carlo simulations 

For various filter length L, sparsity levels k and dimensions M, N, random 
sparse filter matrices A made of independent random fc-sparse filters were gen- 
erated. Each filter was drawn by choosing: a) a support of size k uniformly at 
random; b) i.i.d. Gaussian coefficients on this support. 
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Histogram of SNR in decibels 
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SNR in decibels 



Figure 1: Histogram of SNR between best permutation of A and original A 



For each configuration (L, k, M, N), 200 such random matrices A were drawn. 
For each A, independent random frequency permutations were applied to ob- 
tain A. The algorithm was then applied to obtain A. The performance was 
measured using the SNR between A and the best permutation of A. 

Figure [T] shows the histogram of SNR values achieved for L = 31, 1 < 
k < L, M g {1,2}, N g {2,3,4}, p = 1. It shows that the algorithm either 
completely succeeds up to machine precision (SNR above 300 dB) or completely 
fails (SNR of the order of dB). For this reason, in the rest of the experiments 
the estimation was considered a success when the SNR exceeded 100 dB. 



5.4 Role of the £ p norm 

Figure displays the success rate as a function of the relative sparsity k/L, for 
various choices of the £ p criterion, with filters of prime length L = 131, N = 2 
sources and M = 5 channels. The vertical dashed line indicates the threshold 
k/L < a(2) associated with the well-posedness guarantee (using an £° criterion) 
of Theorem[21 Surprisingly, one can observe that the success rate increases when 
< p < 2 is increased. The maximum success rate is achieved when p = 2 — e 
with small e > 0. 

Beyond the well-posedness regime suggested by the theory (i.e., to the right 
of the vertical dashed line) the algorithm can succeed, but at a rate that rapidly 
decreases when the relative sparsity k/L increases. In the regime where the 
problem is proved to be well-posed, the proposed algorithm is often successful 
but can still fail to perfectly recover the filters, especially -and surprisingly- for 
small values of k. This phenomenon is strongly marked for p < 1 and essentially 
disappears for p > 1. It remains an open question to determine the respective 
roles of the £ p criterion and of the naive greedy optimization algorithm in this 
limited performance when the problem is well-posed with respect to the £° norm. 
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Average success rate with 200 draws for L=1 31 , N=2, M=5 
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Figure 2: Filter recovery success as a function of p, < p < 1.9 



Average success rate on 200 draws for M=2, N=2 



L=32 

L=64 

-L=128 
- L=256 

L=512 - 

L=31 

L=61 

L=127 " 
- L=257 
L=509 - 



0.2 0.4 0.6 0.8 1 

k/L 



Figure 3: Filter recovery success as a function of L, for p = 1.9 



5.5 Role of the filter length L 

Figure [3] shows the results for different L values with p = 1.9, M = N = 2. 
One can see that the average performance does not seem to depend on whether 
L is prime or not. As L increases, the performance for "small" k/L slightly 
increases, but the success rate degrades for "large" k/L close to a(2). 



5.6 Role of the number of channels M 

Figure HI shows the results for increasing numbers of channels M, with a filter 
length L = 512, N = 2 sources, p = 1.9. One can observe that the success rate 
substantially increases when M is increased from M = 1 to M = 2, and slightly 
increases as M further increases. Although the worst-case well-posedness guar- 
antees are the same, the algorithm seems to benefit from added filter diversity 
across channels. 
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Average success rate on 200 draws for L=51 2 and N=2 




Figure 4: Filter recovery success as a function of M, for p = 1.9 



Average success rate on 200 draws for L=1 31 , M=5 




Figure 5: Filter recovery success as a function of TV, for p = 1.9 



5.7 Role of the number of sources TV 

Figure [5] shows the success rate as a function of the relative sparsity k/L, for N € 
{2,3,4}, with L = 31, M = 5 with p = 1.9. The well-posedness limits k/L < 
a(N) associated to Theorem [5] are indicated with vertical dashed lines. The 
empirical curves confirm that the algorithm can still succeeed beyond the worst- 
case well-posedness guarantees, but with a rapidly decreasing rate of success. 
When the well-posedness guarantees hold, the algorithm can fail, but its rate of 
success is high when the relative sparsity is sufficiently small compared to the 
bound provided by Theorem [21 

5.8 Computation time 

The algorithm evaluates the £ p norm of the TV! permutations of the sources for 
each of the L frequencies. 
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Figure 6: Computation time of the permutation solving algorithm depending 
on the length L of the filter 



To evaluate the £ p norm of the filters, the permuted frequency coefficients 
have to be transformed back into the time domain by inverse Discrete Fourier 
transform. For each filter, the cost of the Discrete Fourier Transform through a 
Fast Fourier Transform is 0(L log 2 L). There are MN filters and hence the cost 
of £ p norm evaluation for a given configuration of sub-bands is 0(M NL log 2 L). 

Hence, the complexity of each sweep through the set of all frequencies is 
0(NIMNL 2 log 2 L). This is rather expensive because the computational cost 
grows in factorial with the number of sources and in square with the filter 
length, but it is tractable for small problem sizes and very efficient compared to 
the brute force approach that would require 0((N\) L ~ 1 MNL log 2 L) operations 
to test all (iV!) z '~ 1 possible permutations up to a global permutation. 

Figure \6\ shows the average computation time over 200 trials for various filter 
length. The red dashed line corresponds to its prediction using the theoretical 
cost estimation as C x L? log 2 L with C « 40 nanoseconds. 



6 Conclusions 

It is now well known that a sufficient sparsity assumption can be used to make 
under-determined linear inverse problems well-posed: without the sparsity as- 
sumption, the problem admits an affine set of solutions, which intersects at only 
one point with the set of sparse vectors. Besides this well-posedness property, a 
key factor that has lead to the large deployment of sparse models and methods 
in various fields of science is the fact that a convex relaxation of the NP-hard 
£° minimization problem can be guaranteed to find this unique solution under 
certain sparsity assumptions. The availability of efficient convex solvers then 
really makes the problem tractable. 

The problem considered in this paper is not a linear inverse problem. Even 
though it is a simplification of the original permutation and scaling problem 
arising from signal processing, it remains a priori a much harder problem than 
linear inverse problems in terms of the structure of the solution set: each solution 



RR n° 7782 



14 Alexis Benichoux, Prasad Sudhakar, Frederic Bimbot, and Remi Gribonval 



comes with a herd of solutions that are equivalent up to a global permutation. 

The fact that we managed to obtain well-posedncss results in this context is 
encouraging, but this is at best the beginning of the story: even if the solution is 
unique, how do we efficiently compute it? Can one hope to extend these results 
to the original permutation and scaling problem? Why does the proposed naive 
algorithm perform better for p > 1? Answers to these questions are likely to 
have an impact in fields such as blind source separation with sparse multipath 
channels. 

A Proof of Theorem 1 

First, notice that for each frequency u and channel i, permutations preserve 
the equality VJ ■ Oy[u] = • ajj[o;]. Thus, the same holds in the time-domain 
■ Ylj a ij = By the disjoint supports hypothesis and the quasi-triangle 

inequality for £ p quasi-norms we have 

E IKIIS = II E°«IIS = II EMI < E IMS- ( 15 ) 

3 3 3 3 

We conclude by summing over all channels i. 

B Proof of Lemma [2] and Corollary Q] 

Let us consider the matching count matrix C with entries 

C jn :=${0<e<L: a e (j) = »}, 1 < j, n < N. 

Since J2j Cjn — J2n Cjn = L we have C = L ■ B where B is bi-stochastic. 

A weakened version of Lemma[2J with 2a" (N) = , can be obtained 

by combining the Birckhoff - Von Neumann theorem and Caratheodory theorem. 

Theorem 3 (Birkhoff - Von Neumann Theorem, |16L I17j ) Every bi-stochastic 
matrix is in the convex hull of permutation matrices. 

Theorem 4 (Caratheodory Theorem [18j) Let X be a non empty subset 
of an affine space of dimension n > 1 . Then every element of the convex hull 
of X is a convex combination of p elements of X , with p < n + 1. 

The set of bi-stochastic matrices is an affine subspace of . It is defined by 
2N equations, but these equations are linearly dependent since the sum of the 
sum over rows is the sum of the sum over columns. Hence its affine dimension 
is n < N 2 — (2N — 1) = (JV — l) 2 , and we conclude from Caratheodory's 
theorem that every bi-stochastic matrix is a convex combination of at most 
(N — l) 2 + 1 permutation matrices. One of the coefficients of this combination 
must therefore exceed 1/(1 + (N — l) 2 ), and this leads to a version of Lemma [5] 
with 2a" (N) = 1+ ^_ 1 ^ 2 , as claimed. 

Yet, this bound is suboptimal. The optimal bound in Lemma [5] follows from 
Hall's Marriage Theorem, which by the way is also a key ingredient in the proof 
of the Birkhoff- Von Neumann theorem. 
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Theorem 5 (Hall's Marriage Theorem [141 |19j ) Let (Aj)j £ j be a family 
of subsets of a set finite S. There exists a bijection ir : J — > S such that 
tt(J) G Aj for all J if and only if for all E C J 

tt Ujee Aj > $E 
The bijection tt is often referred to as a transversal for S. 

Proof (Lemma |2|) For shortness of notation we write a for a(N). Define the 
sets J = S = [1, 7VJ, and Aj := {n : Bj n > 2a}, j G J, and consider the 
property 

P k :VEcJ,$E<k=>t U jeE A, > $E. 

We wish to prove that Vk holds true for all 1 < k < N: then, by Hall's Marriage 
Theorem, there exists a bijection tt : j — > such that G Aj for all j, 
yielding in turn the permutation matrix P with ones at the entries (j, Tr{j)). We 
proceed by contradiction: assume that Vn does not hold true. Since V\ holds 
true, without loss of generality, for some 1 < fco < N: 

tt Ui<j<fe Aj > fc , and (t Ui< fe < feo+ i Aj < fc . 

Hence, without loss of generality: 

Ui< fc < fco A,- = [1 fc ] D A ko+1 . 

It follows that for n > fco an d j < fco + 1, we have n £ Aj, hence Bj n < 2a. Now 
we use the bi-stochasticity of B (J^ ■ Bj n = J2 n Bjn = 1, Bj n > 0) to obtain 

k ° ^ E E B r- = E E B ^ 

= E ( 1 -EM > E (l-(iV-fco)2a) 

= (fc + l)(l-(^-fc )2a) 

= fc +(l-(fco + l)(iV-fco)2a). 

This implies 2a > l/(fco + 1){N — fco). However, this yields a contradiction, 
since a simple functional study shows that 

1 

2a. 



i<fc <JV (fc + 1)(N - fc ) 



Equipped with Lemma [2J we can now prove Corollary [T] 

Proof (Corollary [1]) Since C = L ■ B where B is bi-stochastic, there is a 
permutation 7r such that Cj^u) > 2La(N). 



We conclude this section by showing the sharpness of Corollary [T] through 
the construction of permutations that reach the bound. Consider N an integer, 
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and k := N/2 (N even) or k := (N - l)/2 (N odd). Let L be a multiple of 
(ko + 1)(-/V — fc ). Consider the L x N matrix: 



1 

2 



U 
1 



k 
U 



k "■■ 
U k 



U 
1 



where: a) the left part, of size L x (ko + 1), is filled with the column vectors 
i £ R L /( fc o+!) made of constant entries equal to the integer 1 < i < ko and 
the vector U S R L /( fe o+i) mac i c Q f the vertical concatenation of the N — ko 
column vectors j S R i /( ft o+i)(JV"-/so) w jth constant entries fco + 1 < j < N; b) 
the rows of the the right part, of size L x (N — ko — 1), include exactly once each 
integer 1 < £ < N which does not already appear in the corresponding row of 
the left part. By construction, the L rows of the matrix X are associated to L 
permutations <j£. We now show that, for any global permutation n, there is at 
least one column 1 < j < ko + 1 such that 

${t ■ (T t (j) = < L /( k o + 1)(N - k ) = L2a(N). 

Applying again the pigeonhole principle yields: among the ko + 1 indices j to 
consider, at least one, j*, must be mapped to an integer 7r(j*) > fco + 1. By 
construction, the columns of E are such that column j* contains at most (in 
fact: exactly) L/(ko + 1)(N — ko) instances of the value 7r(j*). 



C Proof of Theorem [2] 

We can now conclude the proof of Theorem [5] By Corollary [TJ there is a permu- 
tation 7r such that for each j, we have ||F(a.y — a i7T ^)\\o < L(l — 2a(N)), hence 
A(A,A|tt) < L(l - 2a(N)) and finally A (A, A) < L(l - 2a(N)). Combined 
with the assumption k < La(N), we obtain 2fc+ A < L, and we conclude thanks 
to Lemma [T] 



D Proof of Lemmata ffl [3] H O 

We prove Lemma [T] first, then the statements of Lemma[3]in the following order: 
1), 3), 2). We begin by some notations and fact regarding Dirac combs. 



D.l Dirac combs 

Let p,q > 1 be two integers and L = pq their product. The unit Dirac comb with 
q spikes and of step p, denoted ni p , is the vector of C L defined by m q [t] = 1/ y/q 
if t = 0[p], m q [t] = otherwise. Its Fourier transform is the unit Dirac comb 
with p spikes and of step q: Fm q = m p . For < n < p an integer translation 
index and < m < q an integer modulation index, one can define the translated 
and modulated Dirac comb rn 9: „, m = T n M m m q where T„ is the circular shift 
by n samples, and M m is the frequency modulation (M m u)[t\ := u[t] ■ e 2l * mt / L . 
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One can check that the collection {ni g! „ , m }o<n<p,o<m<q is an orthonormal basis 
of C L . 

D.2 Proof of Lemma Q] 

Let 7To be the permutation such that Ao(A, A) = min Te e N Ao(A, A|7r). By 
abuse of notation we still denote A the matrix obtained by applying ttq to 
permute the columns of the original filter matrix. For each channel i and a 
source index j such that = we obviously have ||ay ||o < ll&ijllo- Now, 
since Ao > 1 we have A ^ A hence there exists a pair i,j such that ^= aij. 
By the £° Dirac-Fourier uncertainty principle |12l Theorem 1], for any vector 
u G C L we have ||u||o||Fu||o > L. Hence, by the hypothesis k < L/(2Ao) we 
have 

||a«||o + ||5y||o > \\a.ij - anWo (16) 

> L/Matj - aij )\\ (17) 

> L/A > 2k (18) 

> ||aij||o + ||aij'||o (19) 

where j' is an arbitrary source index. Hence for every i,j such that 5,j 7^ o,j 
and any j', ||a ? j||o > ||a«j'||o; an d we obtain 

||a.y|| >max||aij/|| > ||a y '||o- 

r 

Overall, we have shown that ||A||o > ||A||o. 

When L is prime, a stronger uncertainty principle ||m||o + ||Fu||o > L + 1 
holds [T3j- Hence, under the assumption 2k + Aq < L we can replace (|T7jl - (|T5]) 
with 

. . . > L + 1 - ||F(5ij - a y -)|!o > i + 1 - A > 2fc 
to reach the same conclusion. 

D.3 Proof of Lemma [3] 

We shall simply build an example where A = [a, 0\ is a 1 x 2 matrix of filters. 
Extensions to A an M x iV matrix are trivial by adding mutually distinct sparse 
columns that are distinct from a and (3, and duplicating the first row. 

We exploit Dirac combs as described in Appendix ID. II Define a = nij^o.Oj 
b = — nifc i L/2fc,o- The filters a and b have disjoint support and satisfy ||a||o = 
ll&Ho = k. Since a — b — \pl ni2fc,o.o we have a[ui] = b[ui] whenever u> ^ Q[2k]. 
Hence, permuting the Fourier transforms of a and b on the L/2k frequencies 
{bj = 2kr, < r < L/2k} yields a = b and b — a. Given any u G C L we define 
perturbations a and (3 of a and b 

{a := a + u 
P :=b + T L/2 u 

with Tl/2 a circular shift. Noticing that for uj = 2kr 

(T L/2 u)[u] = e a "^'" uH = e 2Mr u[u] = u[lj] 
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we obtain that, after permuting the Fourier transforms of a and /3 at the fre- 
quencies w = 2kr, < r < L/2k, 

{a = b + u 
(3 =a + T L/2 u 

We choose the vector u to be zero everywhere with two exceptions u[0] := — a[0], 
u i^\ := — Mltl 1 Since r ^L/2 u 7^ u an d a ^ b, we have {a,/?} 7^ {&,$} and 
A ^ A. Moreover, A (A, A) = Ai(A, A) = L/2k. 

Lastly, all considered vectors have k entries of equal magnitude, hence ||a||o = 
Plo = ||«||o = \\P\\o = k, and for any < p < 00 ||a|| p - ||/3|| p - ||S|| P = ||/3|j p . 
In particular, |A|| p = ||A|| p , < p < 00. 

D.4 Proof of Lemma [5] 

We repeat the construction of the proof of Lemma [3] starting from the Dirac 
combs a = m.k>,o,o, b = —~mk',L/2k'fi- Since k' < k < L/2, we have t := k — k' < 
L/2 — k' hence we can choose an ^-sparse vector u which support is outside 
the support of m^y an d such that T^/2U and u have disjoint supports. The 
four vectors {a, b, u, T^^u} have mutually disjoint supports, hence a and (3 
have disjoint supports, {a,/3} 7^ {<5,/3} and A ^ A. Moreover, Ao(A,A) = 
Ai(A, A) = L/2*. Lastly, we have ||a|| - \\P\\o = Mo = \\P\\o = k' + £ = k, 
and the £ p norms of these vectors are also equal, hence ||A|| p = ||A|| p , < p < 
00. 

D.5 Proof of Lemma H 

As in the proof of Lemma [T] we consider A the permuted matrix associated to 
the optimal permutation ttq. Using the inequality 2k < L/Ai < L/2Aq instead 
of 2k < L/Aq wc repeat the steps (jTHJ)- (jT9j) to obtain ||Sy||o > ll a r/'l|o for any 
j G Ei := {j,aij 7^ 5^} and any j' . As a result Ha^Ho > ||ar/||o f° r au hj- The 
assumption that ||Aj|o = ||Aj|o implies that |ay j|o = l a jj||o for all i, j. 

By assumption, A ^ A hence there are indices i,j such that a%j 7^ dij. 
For such since ||aij||o = II ay Ho, each inequality in (flB]) (the inequality 
L/A > 2k being replaced with L/Ai > 2fc) must be indeed an equality. This 
implies that: ||aij||o = l^yllo = *; 2* divides L and Ai = L/2*; the nonzero 
vector bij := dij — ay must be an equality case of the £° uncertainty principle 
with ||6y||o = 2* and ||F6jj||o = L/2*. As a result [13j bij is a scaled, modulated 
and translated version of the Dirac comb 1112k made of 2k Diracs spaced every 
L/2* samples: there exists a scalar 7y 7^ 0, and two integers < n,-j < L/2*, 
< niij < 2k such that 

Moreover since ||ajj||o = ll^ijllo = * an d — aij||o = 2*, the filters dij and 
aij have disjoint supports of size fc. Hence, they are the restriction of bij (resp. 
of — bij) to their respective supports. 
Now, define 

E i>n , m ■= {j e Ei,nij = n,m,ij = m}. 
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As observed in the proof of Theorem 1, the equality ^ . Qy = J^j <kj holds, 
implying X)je_E = Xy = 0- Taking inner products with the Dirac comb 
orthonormal basis ni2fc n m , < n < L/2fc, < to < 2fc, yields 

E T« = 0, (20) 

Since 7y 7^ 0, whenever E^ n%m is not empty it contains at least two distinct 
indices. 

By the disjoint support assumption: for j,j' £ i?i. n ,m, J 7^ j', the original 
filters a y - and ay have disjoint supports. Moreover, we know that these supports 
are subsets of the support of ui2k,n,m which is of size 2k, hence 

t E i,n,m ■ k = || E a U"llo < 2fc. 

Hence, whenever Ei^ n ^ m is not empty, it contains exactly two distinct elements: 
Ei >n>m = {j,.f} where j ^ /. 

Further, observe that: a) ay and ay have disjoint supports of size k which 
are subsets of the support of size 2k of ui2k,n,m] b) ay and a y - have the same 
property. As a result, ay and ay have the same support, which is disjoint 
from that of ay . Similarly, ay has the same support as ay . Finally, Eq. (|20j) 
can be rewritten 7y + 7y = 0, and implies by + fry = 0, that is to say 
ay ay ay -r ay. We conclude that ay = ay/ and 
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